397 research outputs found
Inverse Covariance Estimation for High-Dimensional Data in Linear Time and Space: Spectral Methods for Riccati and Sparse Models
We propose maximum likelihood estimation for learning Gaussian graphical
models with a Gaussian (ell_2^2) prior on the parameters. This is in contrast
to the commonly used Laplace (ell_1) prior for encouraging sparseness. We show
that our optimization problem leads to a Riccati matrix equation, which has a
closed form solution. We propose an efficient algorithm that performs a
singular value decomposition of the training data. Our algorithm is
O(NT^2)-time and O(NT)-space for N variables and T samples. Our method is
tailored to high-dimensional problems (N gg T), in which sparseness promoting
methods become intractable. Furthermore, instead of obtaining a single solution
for a specific regularization parameter, our algorithm finds the whole solution
path. We show that the method has logarithmic sample complexity under the
spiked covariance model. We also propose sparsification of the dense solution
with provable performance guarantees. We provide techniques for using our
learnt models, such as removing unimportant variables, computing likelihoods
and conditional distributions. Finally, we show promising results in several
gene expressions datasets.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
Molding CNNs for text: non-linear, non-consecutive convolutions
The success of deep learning often derives from well-chosen operational
building blocks. In this work, we revise the temporal convolution operation in
CNNs to better adapt it to text processing. Instead of concatenating word
representations, we appeal to tensor algebra and use low-rank n-gram tensors to
directly exploit interactions between words already at the convolution stage.
Moreover, we extend the n-gram convolution to non-consecutive words to
recognize patterns with intervening words. Through a combination of low-rank
tensors, and pattern weighting, we can efficiently evaluate the resulting
convolution operation via dynamic programming. We test the resulting
architecture on standard sentiment classification and news categorization
tasks. Our model achieves state-of-the-art performance both in terms of
accuracy and training speed. For instance, we obtain 51.2% accuracy on the
fine-grained sentiment classification task
- …